Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions
نویسندگان
چکیده
Intron prediction is an important problem of the constantly updated genome annotation. Using two model plant (rice and Arabidopsis) genomes, we compared two well-known intron prediction tools: the Blast-Like Alignment Tool (BLAT) and Sim4cc. The results showed that each of the tools had its own advantages and disadvantages. BLAT predicted more than 99% introns of whole genomic introns with a small number of false-positive introns. Sim4cc was successful at finding the correct introns with a false-negative rate of 1.02% to 4.85%, and it needed a longer run time than BLAT. Further, we evaluated the intron information of 10 complete plant genomes. As non-coding sequences, intron lengths are not limited by a triplet codon frame; so, intron lengths have three phases: a multiple of three bases (3n), a multiple of three bases plus one (3n + 1), and a multiple of three bases plus two (3n + 2). It was widely accepted that the percentages of the 3n, 3n + 1, and 3n + 2 introns were quite similar in genomes. Our studies showed that 80% (8/10) of species were similar in terms of the number of three phases. The percentages of 3n introns in Ostreococcus lucimarinus was excessive (47.7%), while in Ostreococcus tauri, it was deficient (29.1%). This discrepancy could have been the result of errors in intron prediction. It is suggested that a three-phase evaluation is a fast and effective method of detecting intron annotation problems.
منابع مشابه
Intron length distributions and gene prediction
Accurate gene prediction in eukaryotes is a difficult and subtle problem. Here we point out a useful feature of expected distributions of spliceosomal intron lengths. Since introns are removed from transcripts prior to translation, intron lengths are not expected to respect coding frame, thus the number of genomic introns that are a multiple of three bases ('3n introns') should be similar to th...
متن کاملLoss of Chloroplast trnLUAA Intron in Two Species of Hedysarum (Fabaceae): Evolutionary Implications
Previous studies have indicated that in all land plants examined to date, the chloroplast gene trnLUAA isinterrupted by a single group I intron ranging from 250 to over 1400 bp. The parasitic Epifagus virginiana haslost, however, the entire gene. We report that the intron is missing from the chloroplast genome of twoarctic species of the legume genus Hedysarum (H. alpinum, H. ...
متن کاملGenome architecture - number, size and length distributions of exons and introns in six crown eukaryotic genomes
The genome profile of exon-intron distributions is presented for six eukaryotic genomes (H. sapiens, P. troglodytes, M. musculus, D. rerio, C. elegans, D. melanogaster) to deduce similarities and differences among their genome design and architecture. Interestingly, in all the six genomes, the total length in exons, introns and intergenic DNA on each chromosome is significantly correlated to th...
متن کاملUsing intron position conservation for homology-based gene prediction
Annotation of protein-coding genes is very important in bioinformatics and biology and has a decisive influence on many downstream analyses. Homology-based gene prediction programs allow for transferring knowledge about protein-coding genes from an annotated organism to an organism of interest.Here, we present a homology-based gene prediction program called GeMoMa. GeMoMa utilizes the conservat...
متن کاملThe presence of GC-AG introns in Neurospora crassa and other euascomycetes determined from analyses of complete genomes: implications for automated gene prediction.
A combination of experimental and computational approaches was employed to identify introns with noncanonical GC-AG splice sites (GC-AG introns) within euascomycete genomes. Evaluation of 2335 cDNA-confirmed introns from Neurospora crassa revealed 27 such introns (1.2%). A similar frequency (1.0%) of GC-AG introns was identified in Fusarium graminearum, in which 3 of 292 cDNA-confirmed introns ...
متن کامل